VII. Conclusion¶

Therefore, we now have a model capable of predicting with a relatively precise accuracy the obesity type of a person depending on all of the features presented in this dataset (except for the Weight column which gives too much information).

Based on our tests and analyses from earlier, we can conclude that our algorithm must have a pretty good basis since all of the predictions applied to our group ended up being correct.

However, we know that it is still very open to errors, as there are many columns that will completely bias our model's predictions, more so than what the features should impact realistically, as as huge part of our data has been artificially generated by oversampling, which leads to an overfitting of certain features.

For example, there are precisely 264 rows that belong to the Obesity_Type_III, which is the most dangerous type in terms of health, but they are extremely poorly distributed.

Indeed, out of these 264 rows, only one of them is a male. Even though this could theorically be plausible, it seems unlikely that out of the group of people interrogated there was such a discriminated representation of that obesity type category. And this is only for the Gender column, there are many other features that have the same problem applied for them.

In the end, oversampling the dataset to get a larger and broader view over the repartition of people belonging to each category was an interesting idea, but was executed poorly, and gave too much of a bias to the dataset.


That being said, with our model now complete and functional, we could apply it to any person given they fill all of their personal data for each of the features required, but we could not find another dataset with these exact same columns, so we only applied it to the members of our group to prove its efficiency.

After multiple tries, we found out that our model always predicts the right category for Killian, but for Louis it seem to switch from Insufficient_Weight to Normal_Weight each time we generate a new model, and for Marc it switches from Normal_Weight to Overweight, so the little part of uncertainty is still clearly visible.

However, our group lacks diversity since all three of us belong in the Normal_Weight category, but we also have similar eating habits and physical conditions. What would have been interesting was to test our algorithm on people with very different habits than us to see how our model would perform with a wider range of subjects.

If you now wish to test our model personaly or with fictional values, you can test our API application and fill the form with all of the data, and you'll find out what obesity type our model predicts for you.

Thank you for your attention throughout this notebook, we hope that you enjoyed reading our reports and analyses !¶

duck.gif